Optimal Discovery of Subword Associations in Strings
نویسندگان
چکیده
Given a textstring x of n symbols aJld an illtcgcr constant. d, we consider the problem of finding, for any pair (H, =) of subwords of x the number of times that yand z occur in tandem (i.e., with no intermediate occurrence of either one of them) within a distance of d symbols of x. Although in principle there might be Ij4 distind subword pairs in X, we show that it suffices to consider a family of only [/2 SlLCh pairs, with the property that for any neglected pair (w', Zl), there is a corresponding pair (w, z) contained in our family and such that: (i) Wi is a prefix of wand z' is a prefix of z, and (ii) the tandem index of (w', =') equals that of (w, z). We show that an algorithm for the construction of the table of all such tandem indices can be bllilt to run in optimal O(n2 ) time and space.
منابع مشابه
Counting Occurrences of Some Subword Patterns Alexander Burstein and Toufik Mansour
We find generating functions for the number of strings (words) containing a specified number of occurrences of certain types of order-isomorphic classes of substrings called subword patterns. In particular, we find generating functions for the number of strings containing a specified number of occurrences of a given 3-letter subword pattern.
متن کاملQuasiperiods, Subword Complexity and the Smallest Pisot Number
A quasiperiod of a finite or infinite string/word is a word whose occurrences cover every part of the string. A word or an infinite string is referred to as quasiperiodic if it has a quasiperiod. It is obvious that a quasiperiodic infinite string cannot have every word as a subword (factor). Therefore, the question arises how large the set of subwords of a quasiperiodic infinite string can be [...
متن کاملMulti-phone strings as subword units for speech recognition
The choice of speech unit affects the accuracy, complexity, expandability and ease of adaptation of ASRs to speaker and environmental variations. This paper explores a method of subword modelling based on the concept of multi-phone strings. The motivation in using the longer duration multi-phone strings is to reduce the loss of contextual information, cross-phone correlation, and transitions. M...
متن کاملOn scattered subword complexity
Sequences of characters called words or strings are widely studied in combinatorics, and used in various fields of sciences (e.g. chemistry, physics, social sciences, biology [2, 3, 4, 11] etc.). The elements of a word are called letters. A contiguous part of a word (obtained by erasing a prefix or/and a suffix) is a subword or factor. If we erase arbitrary letters from a word, what is obtained...
متن کاملCounting occurrences of some subword patterns
Counting the number of words which contain a set of given strings as substrings a certain number of times is a classical problem in combinatorics. This problem can, for example, be attacked using the transfer matrix method (see [20, Section 4.7]). In particular, it is a well-known fact that the generating function of such words is always rational. For example, in [20, Example 4.7.5] it is shown...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004